-- remark by Antonio Torralba (after his third beer)
Overview
Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) -- a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.
The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.
Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)
We will meet on Mondays and Wednesdays Noon-1:20pm in Wean 5409.
Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
TA: Tomasz Malisiewicz, Smith Hall 232.
Projects
Check out this list of data sources for some ideas on where to get images to work with.Challenges: Each project team will have regular meetings to discuss the progress of their course project.
Meeting times are listed on the project meeting schedule.
Paper Discussion
Leave your comments about papers on the Class BlogPaper List
The paper list contains papers that will be discussed in class.Schedule
Introduction
Date | Presenter | Paper title | Slides |
Jan. 12 | Alyosha Efros | Introduction, Vision: Measurement vs. Perception Administrative stuff, overview of the course, datasets |
Intro ppt |
Jan. 14 | Alyosha Efros | Overview lecture on theories of Visual Perception Cavanagh, P. (1995) Vision is getting easier every day Optional reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? |
Theories ppt |
Jan 19 | MLK Jr. Day -- no class | ||
Jan. 21 | Alyosha Efros | Overview lecture on the physiology of vision Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision |
Physiology ppt |
Jan. 26 | Alyosha Efros | What should be done at the Low level? | Low Level ppt |
Jan. 28 | Varun | Probability of Boundary D. Martin, C. Fowlkes, and J. Malik. PAMI May 2004. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008. Using Contours to Detect and Localize Junctions in Natural Images |
Global Pb pdf |
Feb. 2 | Varun/Alyosha | Probability of Boundary Continued When is object/scene recognition just texture recognition? |
|
Feb. 4 | Alyosha Efros | When is object/scene recognition just texture recognition? Renninger, L.W. & Malik, J. Vision Research 2004. When is scene recognition just texture recognition? Csurka, G., Bray, C., Dance, C., and Fan, L. ECCV 2004. Visual categorization with bags of keypoints Winn, J., Criminisi, A. and Minka, T. ICCV 2005.Object Categorization by Learned Universal Visual Dictionary |
Bag of Words ppt |
Feb. 9 | Dan | TextonBoost Day TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. J. Shotton, J. Winn, C. Rother, A. Criminisi. In Proc. ECCV 2006. (optional) Journal version of TextonBoost TextonBoost Code |
TextonBoost+STF
pdf TextonBoost+STF ppt |
Feb. 11 | Dan/Alyosha | Semantic Texton Forests Semantic Texton Forests for Image Categorization and Segmentation. J. Shotton, M. Johnson, R. Cipolla. In Proc. IEEE CVPR 2008. Semantic Texton Forests implementation Intro to objects: Geometry vs. Appearance Object Recognition in the Geometric Era: a Retrospective. J. Mundy. 2006. |
(link is above) |
Feb. 16 | James Hays | Large Scale Scene Matching for Graphics and Vision | |
Feb. 18 | Alyosha | Appearance makes an appearance: Sliding windows, constellations models, pictorial structures, and more. | Objects and Parts ppt |
Feb. 23 | Edward | Parts-Based Object Recognition A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, D. Ramanan, In Proc. IEEE CVPR 2008. code |
Latent pdf |
Feb. 25 | Alyosha | Introduction to Context | Context |
March 2 | Michael Tarr | Uncovering the Fundamental Principles of Visual Cortex | |
March 4 | Brian |
Object Recognition by Scene Alignment B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman In NIPS, 2007. code for gist descriptor SIFT flow: dense correspondence across different scenes C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman. ECCV, 2008. project page |
Stealing Objects with Computer Vision |
March 16 | Ekaterina | Contextual priming for object detection A. Torralba. IJCV, Vol. 53(2), 169-191, 2003. Object detection and localization using local and global features K. Murphy, A. Torralba, D. Eaton, W. T. Freeman. Sicily workshop on object recognition, 2005. (see also The context challenge) |
Context Challenge slides |
March 18 | Alyosha / Utsav |
Introduction to Segmentation Objects in Context Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora and Serge Belongie. ICCV 2007. Context Based Object Categorization: A Critical Survey Carolina Galleguillos and Serge Belongie Technical Report UCSD CS2008-0928, 2008. |
Segmentation |
Friday March 20 NSH 1109 |
Utsav / Alyosha | Context Continued... Object Categorization using Co-Ocurrence, Location and Appearance Carolina Galleguillos, Andrew Rabinovich and Serge Belongie. CVPR 2008. Segmentation Continued... Recovering Human Body Configurations: Combining Segmentation and Recognition G. Mori, X. Ren, A. Efros, and J. Malik. CVPR 2004. |
Objects in Context |
March 23 | Pyry |
Learning a Classification Model for Segmentation. Xiaofeng Ren and Jitendra Malik. in ICCV 2003. project page Image Segmentation by Data-Driven Markov Chain Monte Carlo. Z. Tu and S. C. Zhu, PAMI, vol.24, no.5, pp. 657-673, May, 2002. project page |
Segmentation Through Optimization |
March 25 | Alyosha | Surfaces On the semantics of a glance at a scene. Biederman, I. 1981 Recovering Surface Layout from an Image. D. Hoiem, A.A. Efros, and M. Hebert. IJCV, Vol. 75, No. 1, October 2007. See also classic papers: Yakimovsky and Feldman (1973), Ohta, Kanade, Sakai (1978), Barrow and Tenenboum (1978). |
It's a 3D world, after all! |
March 30 | Alyosha | Occlusion and Figure/Ground Reasoning Figure/Ground Assignment in Natural Images. Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, ECCV 2006. Project Page Recovering Occlusion Boundaries from a Single Image. D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert. ICCV 2007 |
Occlusions |
April 1st | Jiyan | Depth estimation from image structure A. Torralba, A. Oliva. PAMI Vol. 24(9): 1226-1238. 2003. Depth Information by Stage Classification. Vladimir Nedovic, Arnold W.M. Smeulders, Andre Redert and Jan-Mark Geusebroek. ICCV 2007. Learning Depth from Single Monocular Images Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 2005. |
Learning Depth |
April 6 | Mark | Total
Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A. In ICCV 2007. |
Content Based Image Search |
April 8 | Alyosha | Categorization Principles of Categorization. Eleanor Rosch Big Book of Concepts, Chapter 3. Gregory L. Murphy. (just focus on "Exemplar View" section) |
Concepts: from Instances to Meaning |
April 10: 3:30pm NSH 1305 | Derek Hoiem | Inferring Object Attributes |
|
April 13 | Yuandong | Sharing visual features for multiclass and multiview object detection A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007. Sharing Features Code |
Sharing Slides |
April 15 | Zhaoyin | Learning compositional models for object categories from small sample sets J. Porway, B. Yao, and S.C. Zhu Book Chapter in Sven Dickinson et al (eds.) Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009 A Stochastic Grammar of Images Song-Chun Zhu and David Mumford Foundations and Trends in Computer Graphics and Vision Vol. 2, No 4. 2007. |
Grammar Slides |
April 20 | Alyosha and Scott | Learning Realistic Human Actions from Movies. Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld. in Proc. CVPR'08 project page |
video Action Slides |
April 22 | Alyosha | The Unreasonable Effectiveness of Data and the Wisdom of Crowds | data |
April 27 | Alyosha + everyone | How do we know that we have solved vision? | Solving Vision |
April 29 | Project Presentations (1-4) | ||
April 30, 6-8pm in NSH 3002 | Project Presentations (5-10) |
Similar Courses
This course has been inspired by these offered by several of my colleagues. Here is a partial list:- Visual Recognition and Search (Kristen Grauman, Texas-Austin, Spring 2009)
- Visual Scene Understanding (Derek Hoiem, UIUC, Spring 2009)
- Statistical Models for Visual Recognition (Deva Ramanan, UCI, Winter 2009)
- Object Recognition and Scene Understanding (Antonio Torralba, MIT, Fall 2008)
- Scene Understanding Seminar (Aude Oliva, MIT, Fall 2008)
- Selected Topics in Vision & Learning (Serge Belongie, UCSD, Fall 2006)
- Learning and Inference in Vision (Bill Freeman, MIT)
- High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
- Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
- Recognition Problems in Computer Vision (Greg Mori, SFU, Fall 2007)
- Visual Recognition (Pietro Perona, CalTech)
- Vision and Learning (Jianbo Shi, UPenn)